Splitting statistics

This code analyses splitting statistics for CTC-clusters.

The analysis takes a list of trees sampled from its posterior distribution as input and samples mutations placements for each of the trees.

Configure the script

inputFolder <- "/Users/jgawron/Documents/projects/CTC_backup/input_folder"
simulationInputFolder <- "/Users/jgawron/Documents/projects/CTC_backup/simulations/simulations2"
treeName <- "Br16_AC"
nTreeSamplingEvents <- 1000
nMutationSamplingEvents <- 1000

Loading data

source("/Users/jgawron/Documents/projects/CTC-SCITE/CTC-SCITE/experiments/workflow/resources/functions.R")
## ── Attaching core tidyverse packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
input <- load_data(inputFolder, treeName)
## Rows: 40000 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): Tree
## dbl (4): LogScore, SequencingErrorRate, DropoutRate, LogTau
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 557 Columns: 72
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr  (3): X1, X3, X4
## dbl (69): X2, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X1...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 34 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): Cluster, Description
## dbl (3): CellCount, TCs, WBCs
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
postSampling <- input$postSampling
nClusters <- input$nClusters
ClusterID <- input$clusterID
nCells <- input$nCells  
nMutations <- input$nMutations
nClusters <- input$nClusters
alleleCount <- input$alleleCount
mutatedReadCounts <- input$mutatedReadCounts
totalReadCounts <- input$totalReadCounts
sampleDescription <- input$sample_description

Sample description

Each row corresponds to a cell. Column description: - Cluster: An number indicating which sample the cell belongs to. - ClusterName: The name of the sample in the nodeDescription.tsv file - WBC: a binary vector indicating whether the cell is a white blood cell (1) or not (0). - color: Indicates the color of the cluster in the tree, as described in the nodeDescription.tsv file.

print(sampleDescription)
##    Cluster ClusterName WBC            color single_cell
## 1        0    Br16_AC1   0       lightcoral       FALSE
## 2        0    Br16_AC1   0       lightcoral       FALSE
## 3        0    Br16_AC1   0       lightcoral       FALSE
## 4        1   Br16_AC10   0           gray93        TRUE
## 5        2   Br16_AC12   0       sandybrown        TRUE
## 6        3   Br16_AC13   0       sandybrown        TRUE
## 7        4   Br16_AC14   0         skyblue3       FALSE
## 8        4   Br16_AC14   0         skyblue3       FALSE
## 9        5   Br16_AC15   0          thistle       FALSE
## 10       5   Br16_AC15   0          thistle       FALSE
## 11       6   Br16_AC16   0     lemonchiffon       FALSE
## 12       6   Br16_AC16   0     lemonchiffon       FALSE
## 13       6   Br16_AC16   0     lemonchiffon       FALSE
## 14       7   Br16_AC17   0       violetred3       FALSE
## 15       7   Br16_AC17   0       violetred3       FALSE
## 16       7   Br16_AC17   0       violetred3       FALSE
## 17       7   Br16_AC17   0       violetred3       FALSE
## 18       8   Br16_AC18   0   lightslateblue       FALSE
## 19       8   Br16_AC18   0   lightslateblue       FALSE
## 20       9   Br16_AC19   0           gray93        TRUE
## 21      10   Br16_AC20   0         deeppink       FALSE
## 22      10   Br16_AC20   0         deeppink       FALSE
## 23      10   Br16_AC20   0         deeppink       FALSE
## 24      11   Br16_AC21   0           gray93        TRUE
## 25      12   Br16_AC22   0 mediumaquamarine       FALSE
## 26      12   Br16_AC22   0 mediumaquamarine       FALSE
## 27      12   Br16_AC22   0 mediumaquamarine       FALSE
## 28      13   Br16_AC24   0        mistyrose       FALSE
## 29      13   Br16_AC24   0        mistyrose       FALSE
## 30      14   Br16_AC25   0           gray93        TRUE
## 31      15   Br16_AC26   0           gray93        TRUE
## 32      16   Br16_AC27   0       powderblue       FALSE
## 33      16   Br16_AC27   0       powderblue       FALSE
## 34      17   Br16_AC28   0        steelblue       FALSE
## 35      17   Br16_AC28   0        steelblue       FALSE
## 36      17   Br16_AC28   0        steelblue       FALSE
## 37      18   Br16_AC29   0           gray93        TRUE
## 38      19    Br16_AC3   0   paleturquoise3       FALSE
## 39      19    Br16_AC3   0   paleturquoise3       FALSE
## 40      19    Br16_AC3   0   paleturquoise3       FALSE
## 41      20   Br16_AC30   0      greenyellow       FALSE
## 42      20   Br16_AC30   0      greenyellow       FALSE
## 43      21   Br16_AC33   0           khaki3       FALSE
## 44      21   Br16_AC33   0           khaki3       FALSE
## 45      22   Br16_AC34   0    darkseagreen4       FALSE
## 46      22   Br16_AC34   0    darkseagreen4       FALSE
## 47      23   Br16_AC35   0             gold        TRUE
## 48      24   Br16_AC37   0             plum       FALSE
## 49      24   Br16_AC37   0             plum       FALSE
## 50      25   Br16_AC38   0      yellowgreen        TRUE
## 51      26   Br16_AC39   0      yellowgreen       FALSE
## 52      26   Br16_AC39   0      yellowgreen       FALSE
## 53      26   Br16_AC39   0      yellowgreen       FALSE
## 54      27    Br16_AC4   0     navajowhite2       FALSE
## 55      27    Br16_AC4   0     navajowhite2       FALSE
## 56      28   Br16_AC40   0          crimson       FALSE
## 57      28   Br16_AC40   0          crimson       FALSE
## 58      28   Br16_AC40   0          crimson       FALSE
## 59      28   Br16_AC40   0          crimson       FALSE
## 60      28   Br16_AC40   0          crimson       FALSE
## 61      29    Br16_AC5   0           gray93        TRUE
## 62      30    Br16_AC6   0           gray93        TRUE
## 63      31    Br16_AC7   0        cadetblue       FALSE
## 64      31    Br16_AC7   0        cadetblue       FALSE
## 65      31    Br16_AC7   0        cadetblue       FALSE
## 66      31    Br16_AC7   0        cadetblue       FALSE
## 67      32    Br16_AC8   0    darkslategray       FALSE
## 68      32    Br16_AC8   0    darkslategray       FALSE
## 69      32    Br16_AC8   0    darkslategray       FALSE
## 70      33    Br16_AC9   0           gray93        TRUE

Get null distributions of relevant statistics, stratified by sample:

cutoffsSplittingProbs <- data.frame(clusterSize = vector(), Cutoff = vector())
cutoffsBranchingProbabilities <- data.frame(clusterSize = vector(), Cutoff = vector())

for (clusterSize in 2:5){
  try(
  {treeNameSimulated <- paste(treeName, clusterSize, sep = '_')


  inputSimulated <- load_data(simulationInputFolder, treeNameSimulated)

  postSamplingSimulated <- inputSimulated$postSampling
  nClustersSimulated <- inputSimulated$nClusters
  ClusterIDSimulated <- inputSimulated$clusterID
  nCellsSimulated <- inputSimulated$nCells  
  nMutationsSimulated <- inputSimulated$nMutations
  nClustersSimulated <- inputSimulated$nClusters
  alleleCountSimulated <- inputSimulated$alleleCount
  mutatedReadCountsSimulated <- inputSimulated$mutatedReadCounts
  totalReadCountsSimulated <- inputSimulated$totalReadCounts
  sampleDescriptionSimulated <- inputSimulated$sample_description
  
  distance <- computeClusterSplits(sampleDescriptionSimulated, postSamplingSimulated, treeNameSimulated, nCellsSimulated,
                     nMutationsSimulated, nClustersSimulated,
                     alleleCountSimulated,
                     mutatedReadCountsSimulated, totalReadCountsSimulated,
                     nMutationSamplingEvents = nMutationSamplingEvents, nTreeSamplingEvents = nTreeSamplingEvents,
                     cellPairSelection = c("orchid", "orchid1", "orchid2",
                                           "orchid3", "orchid4", "darkorchid",
                                           "darkorchid1","darkorchid2", "darkorchid3",
                                           "darkorchid4", "purple", "purple1",
                                           "purple2", "purple3", "purple4"))

  

  plot(ggplot(distance$splittingProbs, aes(x = "Values", y = Splitting_probability, fill = 'Splitting_probability')) +
    geom_boxplot())
  cutoffsSplittingProbs <- rbind(cutoffsSplittingProbs, data.frame(clusterSize = clusterSize, Cutoff = mean(distance$splittingProbs$Splitting_probability) + 2 * sd(distance$splittingProbs$Splitting_probability) ))
  
  ##Note that the way the aggregatedBranchingProbabilities are computed all pairs of cells from the same cluster are
  ## taken into account. This has the effect that clusters with more cells would be counted more often and contribute more
  ## to the shape of the final distribution. This is no problem right now as we only aggregate counts from clusters
  ## of the same size, it is however the potential source of a future bug!!
  
  plot(ggplot(data.frame(x = distance$aggregatedBranchingProbabilities), aes(x = x)) +
    geom_histogram(binwidth = 0.01))
  print(data.frame(clusterSize = clusterSize, Cutoff = quantile(distance$aggregatedBranchingProbabilities, probs = 0.95, names = FALSE)[1] ))
  cutoffsBranchingProbabilities <- rbind(cutoffsBranchingProbabilities, data.frame(clusterSize = clusterSize, Cutoff = quantile(distance$aggregatedBranchingProbabilities, probs = 0.95, names = FALSE)[1] ))
  })
}
## Rows: 20188 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): Tree
## dbl (4): LogScore, SequencingErrorRate, DropoutRate, LogTau
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 557 Columns: 80
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr  (3): X1, X3, X4
## dbl (77): X2, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X1...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 38 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): Cluster, Description
## dbl (3): CellCount, TCs, WBCs
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## [1] "Computing genomic distances of leaves: 71 70"
## [1] "Computing the posterior distribution"

## [1] "Computing genomic distances of leaves: 73 72"
## [1] "Computing the posterior distribution"

## [1] "Computing genomic distances of leaves: 75 74"
## [1] "Computing the posterior distribution"

## [1] "Computing genomic distances of leaves: 77 76"
## [1] "Computing the posterior distribution"

##   clusterSize    Cutoff
## 1           2 0.9999968
## Rows: 19114 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): Tree
## dbl (4): LogScore, SequencingErrorRate, DropoutRate, LogTau
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 557 Columns: 78
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr  (3): X1, X3, X4
## dbl (75): X2, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X1...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 37 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): Cluster, Description
## dbl (3): CellCount, TCs, WBCs
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## [1] "Computing genomic distances of leaves: 71 70"
## [1] "Computing the posterior distribution"

## [1] "Computing genomic distances of leaves: 74 73"
## [1] "Computing the posterior distribution"

## [1] "Computing genomic distances of leaves: 77 76"
## [1] "Computing the posterior distribution"

##   clusterSize    Cutoff
## 1           3 0.9998601
## Rows: 16812 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): Tree
## dbl (4): LogScore, SequencingErrorRate, DropoutRate, LogTau
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 557 Columns: 76
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr  (3): X1, X3, X4
## dbl (73): X2, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X1...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 36 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): Cluster, Description
## dbl (3): CellCount, TCs, WBCs
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## [1] "Computing genomic distances of leaves: 71 70"
## [1] "Computing the posterior distribution"

## [1] "Computing genomic distances of leaves: 75 74"
## [1] "Computing the posterior distribution"

##   clusterSize    Cutoff
## 1           4 0.9991121
## Rows: 17901 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): Tree
## dbl (4): LogScore, SequencingErrorRate, DropoutRate, LogTau
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 557 Columns: 76
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr  (3): X1, X3, X4
## dbl (73): X2, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X1...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 36 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): Cluster, Description
## dbl (3): CellCount, TCs, WBCs
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## [1] "Computing genomic distances of leaves: 71 70"
## [1] "Computing the posterior distribution"

## [1] "Computing genomic distances of leaves: 76 75"
## [1] "Computing the posterior distribution"

##   clusterSize    Cutoff
## 1           5 0.9706048

Get the relevant statistics for each of the clusters of a dataset and output numbers of oligoclonal clusters:

nTumorClusters <- 0
nOligoclonalClusters1 <- 0
nOligoclonalClusters2 <- 0
splittingSummary1 <- data.frame(Color = vector(), Oligoclonal = vector(), ClusterSize = vector())
splittingSummary2 <- data.frame(Color = vector(), Oligoclonal = vector(), ClusterSize = vector())

for(clusterSize in 2:5){
  try({
    clusterColor <- sampleDescription %>%
    filter(WBC ==0 &  color != 'gray93') %>%
    group_by(color) %>%
    filter(n() == clusterSize) %>%
    pull(color) %>%
    unique() 
    
    for(color in clusterColor){
      distance <- computeClusterSplits(sampleDescription, postSampling, treeName, nCells,
                     nMutations, nClusters,
                     alleleCount,
                     mutatedReadCounts, totalReadCounts,
                     nMutationSamplingEvents = nMutationSamplingEvents, nTreeSamplingEvents = nTreeSamplingEvents,
                     cellPairSelection = c(color))

      splittingProbs <- mean(distance$splittingProbs$Splitting_probability)
      branchingProbs <- mean(distance$aggregatedBranchingProbabilities)
    
      nTumorClusters <- nTumorClusters + 1
      oligoclonal <- FALSE
      print(clusterSize)
      print(cutoffsSplittingProbs[(cutoffsSplittingProbs$clusterSize == clusterSize), 2])
      if(splittingProbs > (cutoffsSplittingProbs[(cutoffsSplittingProbs$clusterSize == clusterSize), 2])){
        nOligoclonalClusters1 <- nOligoclonalClusters1 + 1
        oligoclonal <- TRUE
      }
      splittingSummary1 <- rbind(splittingSummary1, data.frame(Color = color, Oligoclonal = oligoclonal, ClusterSize = clusterSize))
      oligoclonal <- FALSE
      if(branchingProbs > cutoffsBranchingProbabilities[(cutoffsBranchingProbabilities$clusterSize == clusterSize), 2]){
        nOligoclonalClusters2 <- nOligoclonalClusters2 + 1
        oligoclonal <- TRUE
      }
      splittingSummary2 <- rbind(splittingSummary2, data.frame(Color = color, Oligoclonal = oligoclonal, ClusterSize = clusterSize))
    }
  })
}
## [1] "Computing genomic distances of leaves: 5 4"
## [1] "Computing the posterior distribution"

## [1] 2
## [1] 1.004274
## [1] "Computing genomic distances of leaves: 7 6"
## [1] "Computing the posterior distribution"

## [1] 2
## [1] 1.004274
## [1] "Computing genomic distances of leaves: 9 8"
## [1] "Computing the posterior distribution"

## [1] 2
## [1] 1.004274
## [1] "Computing genomic distances of leaves: 18 17"
## [1] "Computing the posterior distribution"

## [1] 2
## [1] 1.004274
## [1] "Computing genomic distances of leaves: 28 27"
## [1] "Computing the posterior distribution"

## [1] 2
## [1] 1.004274
## [1] "Computing genomic distances of leaves: 32 31"
## [1] "Computing the posterior distribution"

## [1] 2
## [1] 1.004274
## [1] "Computing genomic distances of leaves: 41 40"
## [1] "Computing the posterior distribution"

## [1] 2
## [1] 1.004274
## [1] "Computing genomic distances of leaves: 43 42"
## [1] "Computing the posterior distribution"

## [1] 2
## [1] 1.004274
## [1] "Computing genomic distances of leaves: 45 44"
## [1] "Computing the posterior distribution"

## [1] 2
## [1] 1.004274
## [1] "Computing genomic distances of leaves: 48 47"
## [1] "Computing the posterior distribution"

## [1] 2
## [1] 1.004274
## [1] "Computing genomic distances of leaves: 54 53"
## [1] "Computing the posterior distribution"

## [1] 2
## [1] 1.004274
## [1] "Computing genomic distances of leaves: 1 0"
## [1] "Computing the posterior distribution"

## [1] 3
## [1] 0.9979337
## [1] "Computing genomic distances of leaves: 11 10"
## [1] "Computing the posterior distribution"

## [1] 3
## [1] 0.9979337
## [1] "Computing genomic distances of leaves: 21 20"
## [1] "Computing the posterior distribution"

## [1] 3
## [1] 0.9979337
## [1] "Computing genomic distances of leaves: 25 24"
## [1] "Computing the posterior distribution"

## [1] 3
## [1] 0.9979337
## [1] "Computing genomic distances of leaves: 34 33"
## [1] "Computing the posterior distribution"

## [1] 3
## [1] 0.9979337
## [1] "Computing genomic distances of leaves: 38 37"
## [1] "Computing the posterior distribution"

## [1] 3
## [1] 0.9979337
## [1] "Computing genomic distances of leaves: 67 66"
## [1] "Computing the posterior distribution"

## [1] 3
## [1] 0.9979337
## [1] "Computing genomic distances of leaves: 14 13"
## [1] "Computing the posterior distribution"

## [1] 4
## [1] 0.9887489
## [1] "Computing genomic distances of leaves: 50 49"
## [1] "Computing the posterior distribution"

## [1] 4
## [1] 0.9887489
## [1] "Computing genomic distances of leaves: 63 62"
## [1] "Computing the posterior distribution"

## [1] 4
## [1] 0.9887489
## [1] "Computing genomic distances of leaves: 56 55"
## [1] "Computing the posterior distribution"

## [1] 5
## [1] 0.9914817
numberOfCancerClusters <- sampleDescription %>%
    filter(WBC ==0 &  color != 'gray93') %>%
    group_by(color) %>%
    filter(n() > 1) %>%
    pull(color) %>%
    unique() %>% length() 

print(sprintf('%d out of %d clusters were found to be oligoclonal in %s, using method 1', nOligoclonalClusters1, numberOfCancerClusters, treeName))
## [1] "0 out of 22 clusters were found to be oligoclonal in Br16_AC, using method 1"
print(sprintf('%d out of %d clusters were found to be oligoclonal in %s, using method 2', nOligoclonalClusters2, numberOfCancerClusters, treeName))
## [1] "0 out of 22 clusters were found to be oligoclonal in Br16_AC, using method 2"
print(splittingSummary1)
##               Color Oligoclonal ClusterSize
## 1        sandybrown       FALSE           2
## 2          skyblue3       FALSE           2
## 3           thistle       FALSE           2
## 4    lightslateblue       FALSE           2
## 5         mistyrose       FALSE           2
## 6        powderblue       FALSE           2
## 7       greenyellow       FALSE           2
## 8            khaki3       FALSE           2
## 9     darkseagreen4       FALSE           2
## 10             plum       FALSE           2
## 11     navajowhite2       FALSE           2
## 12       lightcoral       FALSE           3
## 13     lemonchiffon       FALSE           3
## 14         deeppink       FALSE           3
## 15 mediumaquamarine       FALSE           3
## 16        steelblue       FALSE           3
## 17   paleturquoise3       FALSE           3
## 18    darkslategray       FALSE           3
## 19       violetred3       FALSE           4
## 20      yellowgreen       FALSE           4
## 21        cadetblue       FALSE           4
## 22          crimson       FALSE           5
print(splittingSummary2)
##               Color Oligoclonal ClusterSize
## 1        sandybrown       FALSE           2
## 2          skyblue3       FALSE           2
## 3           thistle       FALSE           2
## 4    lightslateblue       FALSE           2
## 5         mistyrose       FALSE           2
## 6        powderblue       FALSE           2
## 7       greenyellow       FALSE           2
## 8            khaki3       FALSE           2
## 9     darkseagreen4       FALSE           2
## 10             plum       FALSE           2
## 11     navajowhite2       FALSE           2
## 12       lightcoral       FALSE           3
## 13     lemonchiffon       FALSE           3
## 14         deeppink       FALSE           3
## 15 mediumaquamarine       FALSE           3
## 16        steelblue       FALSE           3
## 17   paleturquoise3       FALSE           3
## 18    darkslategray       FALSE           3
## 19       violetred3       FALSE           4
## 20      yellowgreen       FALSE           4
## 21        cadetblue       FALSE           4
## 22          crimson       FALSE           5